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Abstract 


We  consider  the  problem  of  minimizing  the  largest  generalized  eigenvalue  of  a  pair  of  sym¬ 
metric  matrices,  each  of  which  depends  affinely  on  the  decision  variables.  Although  this 
problem  may  appear  specialized,  it  is  in  fact  quite  general,  and  includes  for  example  all 
linear,  quadratic,  and  linear  fractional  programs.  Many  problems  arising  in  control  theory 
can  be  cast  in  this  form. 

The  problem  is  nondifferentiable  but  quasiconvex,  so  methods  such  as  Kelley’s  cutting- 
plane  algorithm  or  the  ellipsoid  algorithm  of  Shor,  Nemirovksy,  and  Yudin  are  guaranteed 
to  minimize  it.  In  this  paper  we  describe  relevant  background  material  and  a  simple  interior 
point  method  that  solves  such  problems  more  efficiently.  The  algorithm  is  a  variation  on 
Huard’s  method  of  centers,  using  a  self-concordant  barrier  for  matrix  inequalities  developed 
by  Nesterov  and  Nemirovsky.  (Nesterov  and  Nemirovsky  have  also  extended  their  potential 
reduction  methods  to  handle  the  same  problem  [NN91b].) 

Since  the  problem  is  quasiconvex  but  not  convex,  devising  a  non-heuristic  stopping  cri¬ 
terion  (he.,  one  that  guarantees  a  given  accuracy)  is  more  difficult  than  in  the  convex  case. 
We  describe  several  non-heuristic  stopping  criteria  that  are  based  on  the  dual  of  a  related 
convex  problem  and  a  new  ellipsoidal  approximation  that  is  slightly  sharper,  in  some  cases, 
than  a  more  general  result  due  to  Nesterov  and  Nemirovsky. 

The  algorithm  is  demonstrated  on  an  example:  determining  the  quadratic  Lyapunov 
function  that  optimizes  a  decay  rate  estimate  for  a  differential  inclusion. 

Key  words:  quasiconvex  nondifferentiable  optimization,  generalized  eigenvalue,  linear 
fractional  programming,  analytic  center,  method  of  centers,  interior  point  method,  logarith¬ 
mic  barrier,  Newton  algorithm,  path-following  method,  ellipsoidal  approximations. 
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1  Introduction 


In  this  paper  we  consider  the  problem  of  minimizing  the  maximum  generalized  eigenvalue  of 
a  (symmetric,  symmetric-positive-definite)  pair  of  matrices  that  depend  affinely  on  a  vari¬ 
able  x  that  is  subject  to  some  convex  constraints.  This  problem  includes  linear  fractional 
programming  as  a  special  case.  Our  main  motivation,  however,  is  control  theory,  in  which 
generalized  eigenvalue  minimization  arises  in  many  contexts,  e.g.,  optimal  scaling  of  matri¬ 
ces  with  block-structured  similarity  transformations,  determining  Lyapunov  functions  that 
optimize  some  objective  (such  as  stability  margin),  and  determining  a  joint  Lyapunov  func¬ 
tion  and  state  feedback  that  optimize  some  objective  (see  for  example  [BGFB93,  FBB92, 
FBBEG92,  EGBFB92,  BFBG92]). 

The  problem  is  quasiconvex  and  so  can  be  solved  reliably  by  several  methods,  for  example, 
the  ellipsoid  algorithm  developed  by  Shor,  Nemirovsky,  and  Yudin  [Sho85,  NY83,  BGT81, 
BB91]  or  Kelley’s  cutting  plane  algorithm  [Kel60,  BB91].  In  this  paper  we  describe  an 
interior  point  algorithm  that  solves  the  problem,  and  appears  to  be  very  efficient  compared 
to  these  methods.  We  give  a  simple  proof  of  convergence  for  our  algorithm,  but  we  do  not 
give  a  detailed  complexity  analysis. 

The  same  problem  has  been  considered  by  Nesterov  and  Nemirovsky,  who  have  also 
developed  an  interior  point  algorithm  to  solve  it  [NN91b],  Moreover,  they  give  a  complete 
complexity  analysis  of  their  algorithm. 

Since  the  problem  is  not  convex,  the  problem  of  developing  a  stopping  criterion  or  con¬ 
dition  is  more  complicated  than  for  convex  problems.  (In  convex  problems  duality  theory 
often  gives  us  a  simple  stopping  condition  that  requires  little  extra  computation.)  We  pro¬ 
pose  several  stopping  conditions  that  can  be  used  for  generalized  eigenvalue  minimization. 

When  the  “denominator”  matrix  is  constant,  the  problem  reduces  to  minimizing  the 
maximum  eigenvalue  of  a  symmetric  matrix  that  depends  affinely  on  x.  In  this  case,  the 
problem  is  in  fact  convex  (but  still  nondifferentiable).  Many  researchers  have  considered 
this  problem.  Relevant  work  includes  Cullum  et  al  [CDW75],  Craven  and  Mond  [CM81], 
Polak  and  Wardi  [PW82],  Fletcher  [Fle85],  Shapiro  [Sha85],  Friedland  et  al.  [FN087],  Goh 
and  Teo  [GT88],  Panier  [Pan89],  Allwright  [A1189],  Overton  [Ove88,  Ove92,  OW93,  OW92], 
Ringertz  [Rin91],  Fan  and  Nekooie  [FN92],  and  Fan  [Fan92],  In  [BY89],  Boyd  and  Yang 
use  the  cutting-plane  algorithm  and  Shor’s  subgradient  method  [Sho85]  to  solve  eigenvalue 
minimization  problems  that  arise  in  control  theory.  They  also  describe  a  saddle  point  method 
for  eigenvalue  mimimization  due  to  Pyatnitski  and  Skorodinsky  [PS83]. 

Interior  point  methods  for  eigenvalue  minimization  have  recently  been  developed  by  sev¬ 
eral  researchers.  The  first  were  Nesterov  and  Nemirovsky  [NN88,  NN90b,  NN90a,  NN91a, 
NN93];  others  include  Alizadeh  [Ali92b,  Ali91,  Ali92a],  Jarre  [Jar91a],  and  Vandenberghe 
and  Boyd  [VB93]. 

Of  course,  general  interior  point  methods  (and  the  method  of  centers  in  particular)  have 
a  long  history.  Early  work  includes  the  SUMT  book  by  Fiacco  and  McCormick  [FM68], 
the  method  of  centers  described  by  Huard  et  al.  [LH65,  Hua67],  and  Dikin’s  interior  point 
method  for  linear  programming  [Dik67].  Interest  in  interior  point  methods,  mostly  for  lin- 
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ear  and  quadratic  programs,  surged  in  1979  when  Khachyian  used  the  ellipsoid  method 
developed  by  Shor,  Nemirovsky,  and  Yudin  to  prove  that  linear  programs  can  be  solved  in 
polynomial  time  [Kha79,  GL81].  Interest  surged  again  in  1984  when  Karmarkar  [Kar84] 
gave  his  interior  point  method  for  solving  linear  programs,  which  appears  to  have  very  good 
practical  performance  as  well  as  a  good  worst-case  complexity  bound. 

Since  the  publication  of  Karmarkar’s  paper,  many  researchers  have  studied  interior  point 
methods  for  linear  and  quadratic  programming.  These  methods  are  often  described  in  such 
a  way  that  extensions  to  more  general  (convex)  constraints  and  objectives  are  not  clear. 
However,  Nesterov  and  Nemirovsky  have  developed  a  theory  of  interior  point  methods  that 
applies  to  more  general  convex  programming  problems,  and  in  particular,  problems  involving 
eigenvalue  minimization  and  matrix  inequality  constraints  (see  the  book  [NN93]).  Other 
recent  articles  that  consider  interior  point  methods  for  more  general  convex  programming 
include  Sonnevend  [Son88l,  Jarre  [Jar91bl,  Kortanek  et  al.  [KPY911,  and  the  survey  by 
Wright  [Wri92], 

1.1  Outline 

In  the  remainder  of  section  1  we  describe  the  notation  used  throughout  this  paper,  the 
problem  we  consider  (along  with  the  assumptions),  and  some  duality  results  and  optimality 
conditions  for  our  problem.  In  section  2  we  show  how  many  convex  constraints  can  be  cast  in 
the  form  of  an  affine  matrix  inequality,  and  similarly,  how  many  quasiconvex  objectives  can 
be  expressed  as  maximum  generalized  eigenvalues  of  a  pair  of  matrices  that  depend  affinely 
on  a  variable.  This  justifies  our  claim  that  the  problem  is  much  more  general  than  it  might 
first  appear. 

In  section  3  we  discuss  the  idea  of  the  analytic  center  of  an  affine  matrix  inequality,  and 
in  section  4  we  describe  the  method  of  centers  and  give  a  simple  proof  of  convergence.  In 
the  two  following  sections  we  discuss  some  important  “details”  of  the  method  of  centers: 
nonheuristic  stopping  criteria  and  some  issues  that  arise  in  implementation. 

In  section  7  we  present  an  example:  finding  a  quadratic  Lyapunov  function  for  a  dif¬ 
ferential  inclusion  that  optimizes  a  decay  rate  estimate.  Numerical  results  are  given  for  an 
instance  of  this  problem,  and  compared  to  the  performance  of  the  ellipsoid  algorithm. 

1.2  Notation 

Throughout  this  paper  we  use  the  following  notation.  R  denotes  the  set  of  real  numbers,  Rm 
the  set  of  real  (column)  vectors  with  m  components,  and  RpX9  denotes  the  set  of  real  p  X  q 
matrices.  /  will  denote  the  identity  matrix,  with  size  determined  from  context.  XT  is  the 
transpose  of  the  matrix  or  vector  X ;  for  an  invertible  matrix  we  abbreviate  (Y-1)T  =  (YT)_1 
as  X~T .  J\f(X)  denotes  the  nullspace  of  X.  TrX  is  the  trace  of  a  matrix  X  £  RnXn,  he., 
TrY  =  Xu  +  •  •  •  +  Xnn.  Since  we  will  often  encounter  expressions  of  the  form  Tr (XT) 
with  X  and  Y  symmetric  matrices  (Tr(YY)  is  the  natural  inner  product),  we  will  write 
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it  as  Tr XX.  In  other  words,  matrix  multiplication  has  higher  precedence  than  the  trace 
operator. 

For  symmetric  matrices  X  =  XT ,  Y  =  YT  £  RnXn,  X  <  Y  refers  to  the  partial 
ordering  of  symmetric  matrices  with  respect  to  the  cone  of  positive  definite  matrices,  he., 
zTXz  <  ztY z  for  all  nonzero  z  £  RT  For  matrices  X  and  X,  X  ®  Y  will  denote  the  block 
diagonal  matrix  formed  from  X  and  Y,  he., 


x©x  = 


X  0 
0  Y 


The  largest  eigenvalue  of  a  symmetric  matrix  X  =  XT  £  RnXn  will  be  denoted  Amax(X). 
For  a  matrix  (or  vector)  X,  ||X||  will  denote  the  spectral  norm  or  largest  singular  value  of 

X,  he.,  ||X||  =  (Amax(XTX))  .  (If  X  is  a  vector,  \\X\\  reduces  to  the  Euclidean  norm, 

||X||  =  (^XT X^  ^  .)  ||X||i?  denotes  the  Frobenius  norm  of  a  matrix,  ||X||i?  =  ^TrXTX^  ^  . 

For  a  matrix  X  =  XT  >  0,  X1/2  will  denote  the  symmetric  square-root. 

In  describing  algorithms,  a  superscript  of  the  form  (h),  as  in  x^k\  will  denote  the  value 
of  a  variable  at  the  kth  iteration.  The  symbol  :=  will  denote  assignment. 


1.3  Maximum  generalized  eigenvalue 

The  generalized  eigenvalues  of  the  pair  X  =  XT,  Y  =  YT  >  0,  are  the  roots  of  det(AX  —  X), 
or  equivalently,  the  eigenvalues  of  X_1/,2XX_1/2  (which  of  course  are  real).  Throughout  this 
paper  we  only  encounter  generalized  eigenvalues  of  pairs  of  matrices  X,  Y  with  X  =  XT 
and  Y  =  YT  >  0. 

The  maximum  generalized  eigenvalue  of  the  pair  X,  X,  denoted  Amax(X,  X),  can  be 
characterized  in  several  ways: 


Amax(X,  X)  =  max  {  A  £  R  |  det(AX  -  X)  =  0  } 

=  Amax(x-1/2XX-1/2) 

=  inf  {  A  £  R  |  AX  -  X  >  0  } 

=  sup  |  vTXv  vT Y v  =  1  | 

=  SUP  i  I  U  =  UT  >  0,  U  T2  0 


(1) 

(2) 

(3) 

(4) 

(5) 


The  maximum  generalized  eigenspace  of  the  pair  X,  X  refers  to 

Vmax(X,  Y)^M  (Amax(X,  X)X  -  X) . 


Excluding  0,  these  are  precisely  the  vectors  that  achieve  the  supremum  in  (4),  when  scaled  so 
that  vtY v  =  1.  Similarly,  the  matrices  that  achieve  the  supremum  in  (5)  can  be  described  in 
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terms  of  Vmax(Af,  Y)  as  follows.  Let  iq, .  .  . ,  up,  with  p  >  1,  be  nonzero  vectors  in  Vmax(Af,  Y), 
and  (Ji, dp  >  0.  Then 

U  =  (JiUiu{  +  •  •  •  +  crpupUp  (6) 

satishes  U  =  UT  >  0,  U  ^  0,  TrXU /TrYU  =  Amax(Af,  Y).  Conversely,  any  such  U  can  be 
expressed  as  (6)  for  suitable  choice  of  rq  and  cq.  (Indeed,  we  can  choose  these  vectors  to 
be  orthonormal,  but  we  won’t  need  this  fact.)  Thus,  the  cone  of  matrices  that  achieve  the 
supremum  in  (5)  is  generated  by  the  dyads  uuT  formed  from  u  £  Vmax(Af,  Y). 

On  any  region  in  which  Y  >  0,  Amax(Af,  Y)  is  a  quasiconvex  function  of  the  matrices 
X  =  XT  and  Y  =  YT ,  which  means  that  for  each  A  £  R,  the  sublevel  set 

{  (X,Y)  I  X  =  XT ,  Y  =  YT  >  0,  Amax(X,F)  <  A  } 

is  convex,  since  it  can  be  expressed  as 

{  (X,Y)  \  X  =  XT,  Y  =  Yt  >  0,  XY  —  X  >  0  }. 

Quasiconvexity  can  also  be  characterized  as  follows.  For  any  symmetric  X,  X,  Y  >  0,  Y  >  0, 
and  0  <  9  <  1, 

A max(6X  +  (1  -  6)X,  6Y  +  (  1  -  Q)Y)  <  max{  Amax(X,  Y),  Amax(X,  Y)  }. 

For  fixed  Y  >0,  Amax(Af,  Y)  is  a  convex  function  of  X,  but  in  general  it  is  not  a  convex 
function  of  X  and  Y. 

Whenever  the  dimension  of  Vmax(Af,  Y)  exceeds  one,  Amax(Af,  Y)  is  not  a  differentiable 
function  of  X  and  Y. 

1.4  The  problem 

We  consider  the  optimization  problem  with  variables  x  £  Rm  and  A  £  R  given  by: 

minimize  A  (7) 

A B(x)  —  A(x)  >  0 
B(x)  >  0 
C(x)  >  0 


or  equivalently, 

minimize  Amax(A(x),  B(x)).  (8) 

B(x)  >  0 
C(x)  >  0 

Here,  A,  B}  and  C  are  symmetric  matrices  that  depend  affinely  on  x  £  Rm: 

m  m  m 

A(x)  =  A0  +  ^XiAi,  B(x)  =  B0 +  ^2xiBi,  C(x)  =  C0  +  xtCt,  (9) 

2=1  2=1  2=1 
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where  At  =  Aj ,  Bt  =  Bj  £  RrXr,  and  Ct  =  Cf  £  RsXs. 

The  form  of  the  constraint,  he.,  C(x)  >  0,  may  seem  quite  specialized,  but  we  will  see  in 
section  2  that  a  large  variety  of  constraints  on  x  including,  e.g.,  linear  and  convex  quadratic 
inequalities,  can  be  expressed  in  this  form  with  suitable  C . 

The  optimum  value  of  (8)  will  be  denoted  Aopt: 

Aopt  4  inf  {  A max(A(x),  B(x))  \  B(x)  >  0,  C(x)  >  0  }  .  (10) 


Complex  Hermitian  matrices  are  readily  handled  by  representing  them  in  the  standard 
way  as  real  symmetric  matrices  which  are  twice  as  big.  For  example,  if  A  and  B  in  (8)  are 
complex  Hermitian,  we  form  the  real  symmetric  matrices 


A{x)  = 


5C4(:r)  —  AA(x) 

AA(:r)  3?A(x) 


B(x)  = 


$lB(x)  —  AB(x) 
AB(x)  ?R.B(x) 


and  solve  (8)  with  A,  B}  and  C . 


1.5  Assumptions 

We  make  the  following  assumptions  about  the  data  in  problem  (8): 

1.  The  problem  is  feasible  and  we  are  given  an  initial  feasible  point,  he.,  we  know  A^0) 
and  with  A^B(x^)  —  A(x^)  >  0,  B(x^)  >  0,  and  C(x^)  >  0. 

2.  B  is  bounded  away  from  singularity  on  the  feasible  set,  he.,  we  know  Lin  >  0  such 
that  C(x)  >  0  B(x)  >  bmmI . 

3.  The  feasible  set  is  bounded,  he.,  there  is  some  R  such  that  C(x)  >  0  ||x||  <  R. 

Let  us  briefly  discuss  these  assumptions.  We  can  find  appropriate  A^0)  and  x^°\  or  verify 
that  the  problem  is  infeasible,  by  solving  an  unconstrained  (“phase  I”)  problem,  he.,  by 
minimizing  the  maximum  eigenvalue  of  —(C(x)  ®  B(x))  (using  the  algorithm  described  in 
this  paper,  or  the  more  efficient  methods  for  minimizing  ordinary  eigenvalues  mentioned  in 
section  1).  Similarly,  we  can  fold  an  appropriate  bm in,  or  determine  that  assumption  (2)  does 
not  hold,  by  minimizing  the  maximum  eigenvalue  of  —B(x)  subject  to  C(x)  >  0. 

Assumption  (2)  implies  that  the  constraint  B(x)  >  0  appearing  in  (8)  is  redundant. 
We  can  enforce  the  assumption  (2)  by  augmenting  the  original  constraint  C(x)  >  0  with 
B(x )  >  &min/,  he.,  replacing  C(x)  with  C(x)  ®  ( B(x )  —  bminI )  (adding  this  constraint  may, 
of  course,  change  the  problem). 

Assumption  (3)  implies  that  B  is  bounded  on  the  feasible  set,  he.,  there  is  a  6max  such  that 
C{x)  >  0  B(x)  <  bmeixI.  Assumptions  (1)  and  (3)  imply  that  the  matrices  C\, .  .  . ,  Cm 
are  linearly  independent  (if  not,  {x\ C(x)  >  0}  contains  a  line  passing  through  x^0)). 

Of  course  the  assumptions  (1-3)  imply  that  Aopt  is  finite. 


5 


We  make  one  last  comment  about  the  assumptions.  A  simple  transformation  allows  us  to 
relax  assumptions  (2)  and  (3).  With  assumption  (1)  in  force,  we  can  replace  the  constraint 
C (x)  >  0  with  C (x)  ®  (A ^B(x)  —  A(x))  >  0  without  affecting  the  problem.  (The  additional 
constraint  A  A)  B(x)  —  A(x)  >  0  is  equivalent  to  limiting  the  objective  Amax(A(x),  B(x))  to  be 
smaller  than  A^i,  which  does  nothing  since  is  a  feasible  point  with  objective  less  than 
Ai°i.)  For  this  transformed  problem,  assumption  (2)  becomes 

A ^B(x)  —  A(x)  >  0  and  C(x)  >  0  =>  B(x)  >  bmmI . 

This  same  comment  holds  for  assumption  (3)  as  well.  This  trick  allows  us  to  consider  some 
problems  that  were,  in  original  form,  unconstrained. 

1.6  Duality  and  optimality  conditions 

Consider  first  a  general  symmetric  matrix  function  that  depends  affinely  on  x}  F(x)  = 
Bo  +  YlT=i  xiBi-  Recall  that 

{  ;r  |  F(;r)  >  0  }  =  0  3U  =  UT  >  0,  U  ±  0,  (11) 

Trf/ih  =  0,  z  =  l,...,m, 

TAUFq  <  0. 

This  can  be  seen  as  follows.  {x\ F(x)  >  0}  is  empty  if  and  only  if  the  affine  set  {F(x)\x  £ 
Rm}  does  not  intersect  the  cone  of  positive  definite  matrices.  From  convex  analysis,  this  is 
equivalent  to  the  existence  of  a  linear  functional  that  is  positive  on  the  positive  definite  cone 
and  nonpositive  on  the  affine  set  of  matrices.  The  equivalence  (11)  follows  from  the  fact  that 
the  linear  functionals  that  are  positive  on  the  positive  definite  cone  are  exactly  of  the  form 
ip(X)  =  Tr UX  where  U  is  positive  semidehnite  and  nonzero. 

Applying  (11)  to  F(x)  =  (A B(x)  —  A(x  j)  ®  C(x)  we  have: 

A  <  Aopt  {x\  A  B(x)  -  A{x)  >  0,  C(x)  >  0  }  =  0  (12) 

3U  =  UT  >  0,  V  =  VT  >  0,  U  ®V  ^  0,  (13) 

TrU(XBi  —  Ai)  +  TrVCi  =  0,  z  =  l,...,m, 

TrlJ(\B0  -  A0)  +  TrVC'o  <  0. 

We  will  use  this  result  in  section  5  to  develop  appropriate  stopping  criteria  for  our  algorithm. 
Note  that  we  can  consider  the  problem 

maximize  A  (14) 

U  =  UT  >  0,  V  =  VT  >  0 
Tr  U  +  Trffi  =  1 

TrU(XBi  —  Ai)  +  TAVCi  =  0,  i  =  l,...,m 
TrU(XB0  -  A0)  +  TrVC0  <  0. 
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as  a  sort  of  dual  problem  to  the  quasiconvex  problem  (8).  The  problem  (14),  however,  has 
no  nice  convexity  or  quasiconvexity  properties  (except  that  for  fixed  A,  the  set  of  U  and  V 
that  satisfy  the  constraints  is  convex). 

Let  x°vt  be  any  optimal  point,  he.,  a  limit  of  feasible  points  with  maximum  generalized 
eigenvalue  converging  to  Aopt.  Such  a  point  x°vt  satisfies: 


Amax(A(xopt; 

),B(xopt)) 

= 

A°Pt 

(15) 

A  optB(xopt) 

1 

AT 

0 

> 

0 

(16) 

B(xopt) 

> 

^min^ 

(17) 

C(xopt) 

> 

0 

(18) 

(We  note,  however,  that  conditions  (15)-(18)  can  also  be  satisfied  by  points  that  are  not 
limits  of  feasible  points  and  hence  not  optimal.) 

Now  let  Uopt  and  Vopt  be  a  pair  of  matrices  that  satisfy  the  conditions  in  (13)  for  A  =  Aopt. 
Then  for  all  z, 

TrUopt(XoptB(z)  -  A(z))  +  Tr VoptC(z)  =  /3,  (19) 

where  f3  does  not  depend  on  z,  and  f3  <  0.  In  particular  for  z  =  xopt,  where  xopt  is  any 
optimal  point,  we  conclude  that 

TrUopt(XoptB(xopt)  -  A(xopt))  +  TrVoptC'(a;opt)  =  f3.  (20) 

Each  term  on  the  left-hand  side  of  this  equation  is  the  trace  of  the  product  of  two  nonnegative 
definite  matrices,  and  so  must  be  nonnegative.  So  we  conclude  that  (3  =  0  and  moreover, 
both  of  the  terms  are  zero: 


TrUopt(XoptB(xopt)  -  A(xopt))  =  0,  (21) 

TrV^CXir015*)  =  0.  (22) 

From  (13)  we  know  that  at  least  one  of  Uopt  and  Vopt  is  nonzero.  In  fact,  our  assumptions 
imply  that  Uopt  ^  0.  If  Uopt  =  0,  then  Vopt  satisfies  Tr  VoptCi  =  0,  i  =  0,  ...,ra,  which 
by  (11)  implies  that  the  constraint  C(x)  >  0  is  infeasible. 

We  can  describe  the  matrices  Uopt  and  Vopt  in  terms  of  generalized  eigenvectors  of  A,  B} 
and  C  at  xopt,  as  follows.  From  (21),  Uopt  is  one  of  the  matrices  that  achieves  the  supremumin 
the  characterization  (5),  he.,  Uopt  >  0,  Uopt  ^  0,  and  ArUopt  A(xopt)  lArUopt  B(xopt)  =  Aopt. 
Therefore,  we  can  express  Uopt  as 

Fopt  =  , 

i  =  l 

where  it;  £  Vmax(4l(a:opt),  B(xopt)),  Ui  ^  0,  and  cq  >  0.  Similarly,  from  (22)  we  have 

v*  =  E  W 

8  =  1 

where  ry  £  Af(C(xopt))  and  r8-  >  0.  (Here,  however,  it  is  possible  that  q  =  0,  he.,  Vopt  =  0.) 
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2  Convex  constraints  as  affine  matrix  inequalities 

In  this  section  we  discuss  ways  of  representing  convex  constraints  on  the  variable  x  in  the 
form  of  an  affine  matrix  inequality  C(x)  >  0.  The  idea  that  affine  matrix  inequalities 
can  be  used  to  represent  a  wide  variety  of  convex  constraints  can  be  found  in  Nesterov 
and  Nemirovsky  [NN90b,  NN90a,  NN93]  (who  formalize  the  idea  of  a  “positive  definite 
representable”  function)  and  Alizadeh  [Ali92b,  Ali9 1] . 

2.1  Multiple  constraints 

We  first  note  that  multiple  constraints  on  x,  expressed  as  the  affine  matrix  inequalities 
Ci(x)  >  0,  i  =  1, .  .  .  ,  /,  are  equivalent  to  the  single  affine  matrix  inequality  C\(x)  ®  •  •  •  ® 
Ci(x)  >  0. 

2.2  Linear  constraints 

The  constraint  aTx  <  6,  where  a  £  Rm  and  6  £  R,  is  represented  by  C(x)  >  0,  where 
C(x)  =  b  -  aTx.  (Here  C(x)  £  Rlxl.) 


2.3  Convex  quadratic  constraints 

The  constraint  ||Z(x)||  <  1,  where  Z  is  an  affine  function  from  Rm  into  Rp,  is  represented 


as 


C(x)  = 


I  Z(x) 
Z(x) 


' T  1 


>  0. 


The  ellipsoid  described  by  (x  —  xc)TP  1(x  —  xc)  <  1,  where  P  =  PT  >  0,  can  be  expressed 
in  the  alternate  form 

Prvt  _  rvt 

(x  -  XC)T  1 


C(x)  = 

(this  matrix  is  related  to  the  one  above  by  a  congruence). 


>  0 


2.4  Matrix  norm  constraints 

More  generally,  a  constraint  on  the  norm  of  a  matrix  Z(x)  £  RpX9  that  depends  affinely  on 
x}  he.,  ||Z(x)||  <  1,  is  represented  as 


C(x) 


I  Z(x) 
Z(x)T  I 


>  0. 
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2.5  Algebraic  Riccati  inequality 

The  (convex)  Riccati  constraint 

AtP(x)  +  P(x)A  +  P(x)BR~1BtP(x)  +  Q  <  0, 

where  P(x)  =  P(x)T  is  an  affine  function  of  x,  and  A,  B ,  Q  =  QT ,  R  =  RT  >  0,  are  matrices 
of  appropriate  size,  can  be  expressed  as 

-ATP(x)-P(x)A-Q  P(x)B  1 

BtP(x )  R  \  >  U' 

These  inequalities  arise  in  control  theory [BGFB93]. 


2.6  Schur  complement  constraints 


The  constraints  described  above  are  special  cases  of  constraints  having  a  “Schur  complement 
form” : 

Q(x)  —  S  (x)  R(x)~1  S  (x)T  >  0  and  R(x)  >  0,  (23) 

where  Q(x)  =  Q(x)T ,  S(x)  and  R(x)  =  R(x)T  are  matrices  of  appropriate  size  that  depend 
affinely  on  the  vector  x.  The  constraint  (23)  can  be  represented  as 

r(r]  _  f  Q(x)  S(x)  1 

1  SXxf  R(x)  > 


2.7  Quasiconvex  functions  as  generalized  eigenvalues 

Analogously,  many  quasiconvex  functions  can  be  represented  in  the  form  Amax(A(x),  B(x)) 
(with  some  suitable  constraint  that  ensures  B(x)  >  0).  For  example,  the  maximum  of  two 
functions  expressed  in  this  form  can  be  expressed  in  this  form  by  forming  block  diagonal 
matrices. 

The  sum  of  two  quasiconvex  functions  expressed  in  the  form  Amax(A(x),  B(x))  need  not 
be  quasiconvex,  and  therefore  cannot  in  general  be  expressed  in  the  same  form.  However,  the 
sum  of  the  (convex)  objectives  A maY(di  (x))  +  Amax(A2(®))  is  readily  handled.  The  problem 

minimize  Amax(Ai(z))  +  A  max(A2(x)) 
x  G  Rm 
C(x)  >  0 

is  equivalent  in  the  obvious  way  to  the  problem  with  m  +  2  variables 

minimize  Amax(2q  +  z2) 
x  G  Rm,  aR2 
C(x)  >  0 
z\I  —  Ai{x)  >  0 
z2I  —  A2(x)  >  0 
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Ar 


which  is  of  the  form  (8). 

Several  common  convex  objectives  can  be  expressed  as  ordinary  maximum  eigenvalues, 
he.,  in  the  form  Amax(A(x)).  The  objective  ||Z(x)||2,  where  Z(x)  £  RpX9  is  an  affine  function 
of  x,  is  given  by 

0  Z(xl 
Z(x) 

This  includes  all  quadratic  (q  =  1)  and  (squared)  matrix  norm  objectives. 

The  usual  linear  fractional  objective  is  given  by 

aTx  +  b  T  T 

r  '  i  Amax(a  x  T  o,  c  x  T  clj 
c1  x  +  a 

(where  cTx  +  d  >  0).  So  the  problem  (7)  includes  all  linear,  linear  fractional,  and  quadratic 
programs. 

The  linear  fractional  objective  can  be  generalized  to  a  quasiconvex  “norm  of  matrix 
fractional”  objective  as  follows.  Given  affine  functions  N(x)  =  N(x)T ,  D(x)  =  D(x)T ,  we 
have 

|iV(x)1/2JD(x)-1/2||2  =  Xinax(N(x),D(x)) 

(for  N(x),  D(x)  >  0). 

Using  Schur  complements  we  can  express  several  interesting  convex  and  quasiconvex 
functions  as  maximum  eigenvalues  or  maximum  generalized  eigenvalues.  As  an  example 
consider  the  convex  function 

I  2 


Tr  NixfDix^Nix)  =  N(xf  D(x)~1/2 


(24) 


where  N(x)  £  RpX9  and  D(x)  =  D(x)T  are  affine  functions  of  x  (and  D(x)  >  0).  The 
objective  (24)  can  be  minimized  by  introducing  a  “slack  matrix”  Y  £  R9Xh 


minimize 

x  £  Rm,  Y  £  R9XU  A  £  R 

TrU  <  A 


A. 


Y  N{  xf 
N(x)  D(x) 


>  0 


The  function  (24)  includes  as  a  special  case  the  quadratic-over-linear  objective 
|| Ax  +  b\\/(cTx  +  d).  Note  also  that  by  substituting  Y  <  XI  for  TrU  <  A  we  can  mini¬ 
mize  the  convex  function 

Amax  (NixfDix^Nix))  =  \\n(x)tD(x)-^2\\2  . 

As  a  hnal  example  we  consider  the  condition  number  of  a  positive  definite  matrix  A  that 
depends  affinely  on  x,  which  is  readily  minimized  as  follows: 

minimize  Amax(A(x),  jjJ). 

A{x)  —  jjJ  >  0,  n  >  0 
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3  Analytic  center  of  an  affine  matrix  inequality 

Throughout  this  section,  which  is  completely  independent  of  the  others,  we  consider  a  general 
affine  matrix  inequality  F(x)  >  0,  where 


F(x)  =  F0  +  ^x%F% 

i  =  l 

and  Fi  =  Fj  £  RnXT  We  will  assume  that  the  matrices  F\, ,  Fm  are  linearly  independent. 
We  denote  the  feasible  set  by  X: 

X  =  {  x  £  Rm  |  F(x)  >  0  }  . 


3.1  A  barrier  function  for  X 

The  function 


x  A  [  logdetT(x)  1  x  £  X 

9\x)  =  1  j  v 

v  ;  ^  x  4  X 


oo 


(25) 


is  finite  if  and  only  if  x  £  X,  and  becomes  infinite  as  x  approaches  the  boundary  of  X,  he.,  it 
is  a  barrier  function  for  X.  There  are  many  other  barrier  functions  for  X  (for  example,  trace 
can  be  substituted  for  determinant  in  (25)),  but  this  one  enjoys  many  special  properties.  In 
particular,  when  x  £  X,  it  is  analytic  and  strictly  convex. 

We  first  give  formulas  for  the  gradient  g(x)  and  Hessian  H(x)  of  <f  at  x  £  X.  It  is  readily 
shown  (see  appendix  A)  that 


gfx)  =  — Tr  F(x)  1Ft 

=  -Tr  F(x)-1/2FtF(x)-1/2 


for  i  =  1, .  .  .  ,  to.  Similarly, 


Htl(x)  =  TrF(x)  FtF(x)  Fj 

=  Tr  (F(ar)-1/2FiF(ar)-1/2)  (F(x)-1/2FjF(x)-1/2 


(26) 

(27) 


(28) 

(29) 


for  i,j  =  1, .  .  . ,  to. 

From  (29)  we  can  verify  that  <f  is  strictly  convex  for  ifX.  For  x  £  X  and  y  £  Rm, 

m 

yTH(x)y  =  J2  ViVj ^  {f(x)~1/2 FF(x)-1/2)  (t(x)-1/2FjF(x)-1/2)  (30) 

i,j= 1 

/  /  m  \  \  2 


=  Tr  Fix)-1'2  [Y^ViFi  F(, 


1-1/2 


\8  =  1 


F(x 


,-1/2 


1-1/2 


V8  =  l 


>  0 


(31) 

(32) 
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which  establishes  that  <f>  is  convex  in  x.  From  (32)  we  see  that  yT H(x)y  =  0  if  and  only  if 
Y1T=  i  =  0-  By  independence  of  Tf, . . . ,  Fm,  we  conclude  that  H(x)  >  0,  he.,  </>  is  strictly 

convex. 

For  future  use  we  note  a  few  more  important  formulas.  From  (27)  we  see  that  for  itX 
and  z  G  Rm  we  have 


Tr  F{x)-1/2F(z)F(x)-1/2 


From  (29)  we  have: 


F(x)-1/2F(z)F(x)-1/2 


(m 

F{x)  +  ^2(Zi  -  Xi)Ft 

i  =  l 

n  —  g(x)T (z  —  x). 


F(x)~1/2 


(33) 

(34) 


Tr  ( F(x )“1/2  ^F(x)  +  f>8  -  Xi)Fj  F{x)-^2j  (35) 

(z  —  x)tH(x)(z  —  x)  —  2  g(x)T(z  —  x)  +  n.  (36) 


The  barrier  function  ^  is  bounded  below  if  and  only  if  X  is  bounded.  The  “if”  part  is 
clear.  To  see  the  “only  if”  part,  suppose  that  X  is  unbounded.  Since  it  is  convex  it  must 
contain  a  ray,  say  {x0  +  av\a  >  0},  where  v  ^  0.  Since  F(x)  >  0  along  this  ray  we  conclude 
that 

m 

8  =  1 

By  independence  of  Tf, .  .  .  }Fm}  F  is  nonzero.  It  follows  that  detT(xo  +  an),  which  is  a 
polynomial  in  a  with  degree  equal  to  the  rank  of  F,  grows  at  least  linearly  with  a.  Therefore, 
<(>  is  unbounded  below  on  the  ray. 


3.2  Analytic  center 

We  suppose  now  that  X  is  nonempty  and  bounded.  From  the  discussion  above  we  conclude 
that  (f)  has  a  unique  minimizer,  which  we  denote  x*\ 

x*  =  argmin  (f>(x).  (37) 

x 

We  refer  to  x*  as  the  analytic  center  of  the  affine  matrix  inequality  F(x)  >  0.  Equivalently, 

x*  =  argmax  detT(x),  (38) 

x  G  X 

that  is,  F(x*)  has  maximum  determinant,  among  all  positive  definite  matrices  of  the  form 
F(x).  Note  that  the  analytic  center  is  invariant  with  respect  to  congruence  transformations, 
he.,  the  analytic  center  of  F(x)  >  0  is  the  same  as  the  analytic  center  of  ZT F(x)Z  >  0  for 
any  nonsingular  matrix  Z . 
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From  (27)  we  see  that  x* 

is  characterized  by 

0 

* 

1 

II 

O 

II 

I—1 

(39) 

or  equivalently 

TrF(x*)  1F(x)  =  n,  for  all  x 

(40) 

(since  the  left-hand  side  is  independent  of  x,  and  the  right-hand  side  is  its  value  at  x*). 
Thus,  F(x*)~ 1  is  orthogonal  to  the  span  of  Tf, .  . . ,  Fm. 

The  definition  (37)  of  the  analytic  center  of  an  affine  matrix  inequality  follows  Nesterov 
and  Nemirovsky  [NN93]  (see  also  Sonnevend  [Son91]).  It  agrees  with  the  usual  definition 
of  the  analytic  center  of  a  set  of  linear  inequalities  (see  e.g.,  Sonnevend  [Son86]),  aj x  <  bi , 
i  =  1, .  .  .  ,  n  (which  can  be  represented  as  an  affine  matrix  inequality  with  diagonal  matrices). 
In  this  case,  x*  maximizes  among  feasible  points  Wf=1{bi  —  ajx),  or  equivalently,  the  product 
of  the  distances  to  the  constraint  planes  aj x  =  bi. 


3.3  Ellipsoidal  approximations 


The  level  curves  of  the  barrier  function  <f  give  a  smooth  approximation  of  the  shape  of  the 
boundary  of  X,  which  of  course  need  not  be  smooth.  Near  x*  the  shape  of  these  level  curves 
is  determined  by  H(x*),  so  it  seems  plausible  that  the  ellipsoids  centered  at  x*  and  with 
shape  determined  by  H(x*)  should  give  a  good  quadratic  approximation  of  the  shape  of  X. 
Alternatively,  it  seems  that  X  should  be  reasonably  well  conditioned  in  the  coordinates  given 
by  x  =  H(x*)~1l2x. 

This  intuition  is  correct.  The  following  inner  and  outer  ellipsoidal  approximations  hold 
for  X: 


C  X  C  £, 


^out? 


where  the  ellipsoids  £-in  and  £out  are  given  by 


£in  =  {ier 
£out  ^  {x  G  Rm 


x  —  x*)T H(x*)(x  —  x*)  <  1 1 , 
x  —  x*)T F[(x*)(x  —  x*)  <  n(n 


(41) 

(42) 


A  proof  is  given  in  appendix  B.  The  inner  ellipsoidal  approximation  holds  for  a  general  class 
of  barrier  functions  (called  self-concordant) ,  which  includes  our  barrier  function  (25),  and  is 
given  in  [NN88,  NN93].  The  outer  approximation  (42)  is  similar  to  an  outer  approximation 
given  by  Nesterov  and  Nemirovsky,  which  holds  for  these  more  general  (self-concordant) 
barriers. 


3.4  Nesterov  and  Nemirovsky’s  Newton  algorithm 

Newton’s  method,  with  appropriate  step  length  selection,  can  be  used  to  efficiently  compute 
x *,  given  an  initial  point  in  X.  We  consider  the  algorithm: 

xA+B  :=  XF)  -  (43) 
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where  a is  the  damping  factor  of  the  kth  iteration. 

In  [NN93],  Nesterov  and  Nemirovsky  give  a  simple  step  length  rule  appropriate  for  the 
general  class  of  self- concordant  barrier  functions  mentioned  earlier,  along  with  a  complete 
convergence  analysis  and  sharp  bounds  on  the  number  of  iterations  required  to  compute  the 
analytic  center  to  within  a  given  accuracy,  starting  from  a  given  initial  feasible  point.  We 
refer  the  reader  to  [NN93]  for  details  of  this  generalization. 

Their  damping  factor  depends  on  a  quantity  which  they  call  the  Newton  decrement  of  </ 
at  x: 


8(x)  = 


H(x)-1/2g(x) 


(The  name  comes  from  the  observation  that  8(x)2/2  is  the  difference  between  <f>(x)  and  the 
minimum  value  of  the  quadratic  approximation  of  </  at  x.  Alternatively,  S(x)  is  the  length 
of  the  Newton  step  —H(x)~1g(x)  measured  in  the  norm  induced  by  the  Hessian  H(x).)  The 
Nesterov-Nemirovsky  damping  factor  is: 


m  1 1  if  aw)  <  1/4 

■  \  1/(1 +  h(x(fc)))  if  s(x^y)  >  i/4 


(44) 


Nesterov  and  Nemirovsky  show  that  this  step  length  always  results  in  £  X  (the 

inner  ellipsoidal  approximation  in  appendix  (B)  shows  that  £  X  provided  a < 

1  /6(x^)).  Moreover,  for  8(x^)  <  1/4,  we  have  h(a/fc+1))  <  2 h(a/fc))2,  he.,  the  algorithm 
converges  quadratically  once  we  start  taking  undamped  Newton  steps.  They  show  that 
whenever  8{x^)  >  1/4,  <f>(x  {k) )  -  4>{  x(k+1^)  >  c,  where  c  is  some  absolute  constant.  Using 
this  fact  they  bound  the  number  of  iterations  required  to  reach  the  region  of  quadratic 
convergence. 

Their  analysis  holds  for  step  length  given  by  exact  line  search,  he., 

a ^  :=  argmin  cj)  (^x^  —  aH(x(-k'>)~1g(x(-k'>fj  , 
a 


since  the  reduction  of  </  while  8  >  1/4  must  exceed  the  absolute  constant  c  guaranteed  using 
the  step  length  rule  (44).  (See  section  6.5  for  a  discussion  of  exact  line  search.) 


3.5  A  least-squares  interpretation 


The  undamped  Newton  step  —H(x)  kg(x)  can  be  interpreted  as  the  solution  of  an  appro¬ 
priate  weighted  least-squares  problem: 


H(x)  1g(x)  =  argmin 

F(x)-1/2F(x  -  v)F(x)-1/2 

v  £  Rm 

=  argmin 

m 

I  -Y.^F(x)-1I2F1F(x)-1I2 

v  £  Rm 

i= 1 

(45) 

(46) 
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Thus,  the  Newton  step  at  x  is  given  by  projecting  /  onto  the  span  of  the  normalized  matrices 

F(x)-1l‘2FlF(x)-1l\ 

We  can  give  a  rough  interpretation  of  this  result.  We  are  trying  to  make  F(z)  “large”  (as 
measured  by  the  determinant).  To  do  this,  we  first  normalize  the  problem  by  a  congruence 
transformation  (multiply  each  Fi  on  the  left  and  right  by  F(x)~ 1//2),  so  that,  in  effect,  we 
have  F(x)  =  /.  Now  we  find  the  “smallest”  F(z),  as  measured  by  the  Frobenius  norm.  Let 
us  call  the  minimizer  xsman.  The  Newton  step  is  then  given  by  the  opposite  of  the  step  from 
x  to  a: smau.  (Roughly  speaking,  if  stepping  from  x  to  xsman  makes  F  “smaller”,  then  stepping 
in  the  opposite  direction  should  make  F  “larger”.) 

The  result  can  also  be  seen  as  follows.  Suppose  the  problem  has  been  normalized  by 
a  congruence  transformation  so  that  F(x)  =  /.  Now  consider  the  two  functions  <f>(x)  = 
log  det  T(x)_1  and  ip(x)  =  |||T(x)|||r.  From  the  formulas  for  the  gradient  and  Hessian  of  <f> 
(with  F(x)  =  I)  we  see  that  the  gradients  of  <p  and  ip  at  x  are  the  same,  except  for  a  change 
of  sign,  and  the  Hessians  are  identical.  Therefore  the  Newton  step  for  <p  is  the  negative  of 
the  Newton  step  for  ip.  Since  ip  is  quadratic,  its  Newton  step  is  the  difference  between  x  and 
its  minimizer. 

The  Newton  decrement  at  x  is  related  to  the  distance  between  /  and  the  span  of  the 
normalized  matrices: 


n  —  8(x)2 


min 

v  G  Rm 


I 


YJv.F(x)-1l2FlF(x) 

8  =  1 


(47) 


Equivalently, 


8{x)  = 


YJvlF(x)-1l2FlF(x)-1l2 


8  =  1 


(48) 


where  v  =  —  F[(x)~1g(x)  is  the  Newton  step.  These  results  follow  from  the  formula  (36) 
noted  in  section  3.2. 

This  least-squares  interpretation  of  the  Newton  step  generalizes  a  well  known  fact  for 
the  linear  inequalities  aj x  <  bi,  i  =  1, .  . .  ,  n.  In  this  case  the  Newton  step  is  given  by  the 
diagonally  weighted  least-squares  problem 


—H(x)  1g(x)  =  argmin 


T 

(l:  v 


v  G  Rm  8=1 


b;  —  aj  x 


-  1 


4  The  method  of  centers 


We  now  consider  again  the  problem  (7): 

minimize  A. 

A B(x)  —  A{x)  >  0 
C(x)  >  0 

Let  n  =  r  +  s,  so  that  (A B  —  A)  ®  C  £  RnXn  (recall  that  A,  B  £  and  C  £  RsXs). 
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4.1  Path  of  centers 

The  assumptions  of  section  (1.5)  imply  that  for  A  >  Aopt,  the  set  { m  |  ( A B(x)  —  A(x))  ®  C(x)  > 
0}  is  nonempty  and  bounded;  therefore  the  analytic  center  of  the  inequality  (XB(x)  —  A(x))® 
C(x)  >  0  is  well  defined.  We  will  denote  this  analytic  center  by  x*(A)  when  we  need  to 
emphasize  its  dependence  on  the  parameter  A.  To  simplify  notation  we  will  write  x*(A)  as 
x*  when  A  is  understood. 

From  (39)  we  see  that  x*  is  characterized  by 

Tr(A5(x*)-A(x*))“1(A58-A8)  +  TrC'(x*)“1C'8  =  0,  i  =  (49) 

The  curve  given  by  x*(A)  for  A  >  Aopt  is  called  the  path  of  centers.  It  can  be  shown  that 
it  is  analytic  and  has  a  limit  as  A  — >  Aopt,  which  we  denote  x°vt  (see  e.g.}  [FM68]).  x°vt  is 
optimal,  since  for  all  A  >  Aopt,  x*(A)  is  feasible  and 

Aopt  <  Amax(A(x*(A)),  i?(x*(A)))  <  A. 

(There  may  be  other  optimal  points  too.)  Since  x°vt  is  optimal  it  satisfies  the  conditions 
given  in  (15)-(18). 


4.2  A  dual  bound  on  the  path  of  centers 

Let  us  fix  A  >  Aopt.  Let  A*  denote  A(x*(A)),  and  similarly  for  B*  and  C*. 

We  will  show  that 

A  —  Aopt  <  rj  (X  —  Amax(A*,  B*))  (50) 

where  rj  =  nbiaax/bia in  (recall  that  n  =  r  +  s  is  the  size  of  (A B  —  A)  ®  (7,  and  Lin  and  6max 
are  defined  in  section  1.5).  We  can  put  (50)  in  the  form: 


Amax(4T,  B*)  -  Aopt  <  h  -  Ij  (A  -  Aopt). 


(51) 


This  equation  shows  that  the  maximum  generalized  eigenvalue  at  the  analytic  center  of 
(A B  —  A)  ®  C  >  0  is  guaranteed  to  be  a  fixed  fraction  closer  to  Aopt  than  A. 

Define 

U  =  (XB*-A*)~\  V  =  C*~\ 

From  (49)  we  see  that 

TrU(XB(z)  -  A(z))  +  TrVC(z)  =  n  (52) 

for  all  2:  (c/.  (19)),  so  in  particular 

TrU(XB(xopt)  -  A(xopt))  +  Tr VC(xopt)  =  n.  (53) 

Since  V  >  0  and  C(x°vt)  >  0,  Tr VC(x°vt)  >  0,  so  we  have 

ATr UB(xopt)  -n<  Tr UA(xopt).  (54) 
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Since  U  >  0  and  B(xopt)  >  bmmB  Tr lJB(xopt)  >  0  and  therefore 


A  - 


n 


TrUB(xopt )  “  TrUB(xop\ 
(the  second  inequality  uses  (5)).  Thus  we  have 

A  -  Aopt  < 


<  ?^(.a;OPt?  <  A max(A(x°pt),B(x°pt))  =  Aopt 


n 


TrUB(; 


,  opt  A 


< 


n 


bm-mTrU 


(55) 


(56) 


(The  second  inequality  uses  Tr UB(xopt)  >  6minTrf/,  which  follows  from  B(xopt )  >  bminI). 
Now  we  note  that 


Tr U  =  Tr(A B*  -  A*)”1  > 


1 


(A  -  Amax(A*, i?*))||i?*|| 

which  we  prove  in  appendix  C.  Finally,  noting  that  ||i?*||  <  6max,  we  have 

A  -  Aopt  <  (n6max/6min)( A  -  Amax(A*,  B*)) 


(57) 


which  is  the  desired  result. 

This  is  the  simplest  dual  bound  for  the  objective  that  can  be  obtained;  in  section  5  we 
derive  more  complicated,  but  better,  bounds. 


4.3  Basic  algorithm 


Perhaps  the  simplest  optimization  algorithm  based  on  the  notion  of  analytic  center  is  the 
method  of  centers  due  to  Lieu  and  Huard  [LH65,  Hua67].  We  describe  here  a  simple  variation 
on  the  method  of  centers. 

The  algorithm  is  initialized  with  A(0)  and  with  A^B(x^)  —  A(x^)  >  0  and  C(x^)  > 
0,  and  proceeds  as  follows: 


x(k+1> 


(1  —  0)Amax(.4(W),  i?(W))  +  0A|fc| 
i*(A(‘+1)) 


(58) 

(59) 


where  9  is  a  parameter  with  0  <  9  <  1. 

The  classic  method  of  centers  is  obtained  with  9  =  0.  In  this  case,  however,  x ^  does  not 
(quite)  satisfy  the  new  inequality  A^k+1^  B{x)  —  A(x)  ®  C(x)  >  0.  With  9  >  0,  however,  the 
current  iterate  x ^  is  feasible  for  the  tightened  inequality  ^A(fc+1)i?(;r)  —  A(x)^  ®  C(x)  >  0, 
and  therefore  can  be  used  as  the  initial  point  in  computing  the  next  iterate  ;r*(A(fc+1)). 

We  now  give  a  simple  proof  of  convergence.  From  (51),  we  have 

Amax(A(x(fc)),  B(x W))  -  Aopt  <  (l  -  ^  (AW  -  Aopt)  .  (60) 
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Subtracting  Aopt  from  both  sides  of  (58)  yields 


X(k+1)  _  Aopt  =  ^  _  0)(Amax(A(x(fc)),  B{ x^j)  -  Aopt)  +  0(A(fc)  -  Aopt). 
Substituting  (60)  into  this,  we  have 

,\(‘+D  _  yv  <(l-  — 4)  (A<‘>  -  Al>pt), 


(61) 


so  that 


X(k)  _  Aopt  <  ^f  _  (A(0)  _  Aopt^ 


Thus,  converges  to  Aopt  at  least  geometrically. 


5  Stopping  criteria 


5.1  Objective  duality  gap 

From  (57),  we  see  that  the  stopping  criterion 

A(fc)  -  Amas(A(xW),h(rW))  < 

^^max 

guarantees  that  on  exit,  A ^  —  Aopt  <  e,  and  therefore 

Amax(d(xW),5(xW))-Aopt<e. 


This  simple  stopping  criterion  has  essentially  no  computational  cost,  since  Amax(A(;r(fc)),  B(x 
must  be  computed  to  hnd  A(fc+1)  anyway.  In  this  section  we  investigate  better  lower  bounds 
on  Aopt  that  can  be  obtained  with  a  little  more  computation. 

Using  the  notation  of  section  4.2,  we  derive  from  (53)  the  inequality 

A  _  AoPt  <  »  ~  TrVC(x^) 

~  Tr  UB(x°Pt) 

The  idea  is  to  derive  some  computable  upper  bounds  on  the  right-hand  side. 

Let  us  list  some  information  we  have,  once  we  have  computed  x*(A): 


C{xopt ) 

> 

o, 

(63) 

B(xopt) 

> 

(64) 

xopt 

e 

<?out 

(65) 

where  £out  is  the  outer  ellipsoid  given  in  (42). 
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n 


(66) 


Using  (63)  and  (64),  we  derive  the  bound 


A 


Aopt  < 


Tr  U 


(which  is  always  better  than  the  simple  bound  (50)). 

Another  bound  can  be  derived  using  (65)  and  the  inequality  (62): 


A 


A°Pt  < 


max 

z  C  U ,  1 1 1 


n-TrVC(z) 
Tr  UB(z) 


(67) 


Note  that  (n  —  TrV C (z)) / TrU B (z)  is  a  linear  fractional  form  in  z .  Therefore  the  right-hand 
side  of  (67)  is  readily  computed — there  is  a  “closed  form”  expression  for  the  maximum  of 
a  linear  fractional  form  over  an  ellipsoid,  which  is  derived  in  appendix  D.  The  bound  (67) 
has  one  major  drawback,  however:  it  can  be  worse  than  the  simple  bound  (50).  It  can  even 
happen  that  the  hyperplane  {z\HrUB(z)  =  0}  intersects  the  ellipsoid  £out,  in  which  case  the 
right-hand  side  of  (67)  is  infinite. 

This  problem  can  be  circumvented.  From  (64)  we  know  that  Tr UB(xopt)  >  6minTrU. 
Hence  x°vt  lies  in  the  halfspace  {z\r£rU B(z)  >  6minTrU}.  Similarly,  from  (63)  we  know  that 
TrUC'(xopt)  >  0,  he.,  x°vt  lies  in  the  halfspace  {z\rYrVC{z)  >  0}.  Therefore  we  can  localize 
x°vt  to  the  intersection  of  £out  and  these  two  halfspaces. 

A  bound  that  uses  this  information  is: 


A 


Aopt  <  max 

-  c  A, 1 1 1 

n  —  Tr VC(z)  <  n 
Tr UB(z)  >  bminTrU 


n  -  Tr VC(z) 
Tr  UB(z) 


(68) 


This  bound  is  always  better  than  all  the  bounds  described  so  far.  To  compute  it  requires  the 
solution  of  the  following  problem:  maximize  a  linear  fractional  form  over  an  ellipsoid,  subject 
to  an  upper  bound  on  the  numerator  and  a  (positive)  lower  bound  on  the  denominator.  This 
problem  also  has  a  “closed  form”  solution,  given  in  appendix  E.  This  solution  is  harder 
to  describe  than  in  the  unconstrained  case.  Computing  it,  however,  requires  essentially  no 
additional  effort  compared  to  the  unconstrained  problem. 


5.2  Constraint  duality  gap 

We  continue  to  use  the  notation  of  section  4.2.  In  this  section  we  show  that 

A  <  min  \max(A(x),  B(x)).  (69) 

C(x)  > 

This  means  that  A  is  a  lower  bound  on  the  minimum  value  of  the  maximum  generalized 
eigenvalue  of  the  pair  (A,H),  subject  to  the  tightened  constraint  C(x)  >  n\m-m(C*) ■  If  the 
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constraint  C(x)  >  0  is  active  at  x°vt  for  the  original  problem,  then  Amin((7*)  converges  to 
zero  as  A  approaches  Aopt.  The  result  (69)  shows  that  in  this  case,  the  stopping  criterion 

XmUC(x^k)  j)  <  e/n 

guarantees  that  on  exit,  x ^  is  e-optimal  for  the  “tightened”  problem 

minimize  Amax(A(x),  B(x)). 

C(x)  >  el 

To  show  (69),  recall  that 

TrU(\Bi  -  Ai)  +  Tr VC\  =  0,  i  =  1, . . . ,  m  (70) 

and 

TrU{\B0  -  A0)  +  TrVC0  =  TrU(\B*  -  A*)  +  TryC'*  =  n. 

Let  fj,  =  Amin ((7*).  Then 

Tr*7(A£0  -  A0)  +  Tr V{C0  -  fil)  =  n  -  fiTrV. 

From 

71 

Try  =  TrC1*-1  >  - — - 

Amin(y*) 

we  conclude 

TrU(XB0  -  A0)  +  Tr V(C0  -  /il)  <  0.  (71) 

Now  note  that  (70)  and  (71)  establish  that  the  affine  matrix  inequality 

(A B(x)  —  A{x))  ®  (C(x)  —  fil)  >  0 

is  infeasible  (by  the  duality  result  (11)).  This  establishes  (69). 

6  Some  notes  on  implementation 

In  this  section  we  briefly  mention  some  of  the  issues  that  arise  in  implementing  the  method 
of  centers. 

6.1  Problem  structure 

In  many  problems,  the  matrices  A,  B}  and  (7,  and  hence  F  =  (A B  —  A)  ®  C  have  a  block 
diagonal  structure,  say, 

I< 

Ft  e0rxn7 

j=i 
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where  n\  +  •  •  •  +  fix  =  n.  Moreover,  each  of  these  K  blocks  may  have  one  of  the  special 
structures  mentioned  in  section  2,  e.g.,  the  special  structure  that  corresponds  to  a  quadratic 
constraint. 

The  choice  of  method  used  to  compute  the  Newton  step  depends  on  how  much  of  the 
problem  structure  we  choose  to  exploit.  As  far  as  we  know,  there  is  not  a  simple  description 
of  a  “best”  method  that  exploits  all  of  the  structure. 

For  future  reference  we  note  an  inequality  relating  m  and  Ylf=i  n<]-  Since  the  dimension 
of  the  set  of  symmetric  matrices  in  0^  is  Ylf=i  nj{nj  +  l)/2  and  the  matrices  F3  are 

independent  (otherwise  the  feasible  set  contains  a  line,  violating  the  assumptions),  we  have 

I< 

m<J2nj{n3  +  l)/2.  (72) 

j= i 

6.2  Normalizing  with  Cholesky  factors 

In  numerical  computations  based  on  the  formulas  of  section  3,  it  is  more  convenient  to  use 
a  triangular  factor  of  F(x)~ 1  instead  of  the  symmetric  square-root  F(x)~x^2  that  appears 
throughout  section  3.  All  of  the  formulas  of  that  section  are  readily  modified  to  use  triangular 
factors  rather  than  F(x)~x^2. 

Let  L  be  the  Cholesky  factor  of  F(x),  he.,  L  is  lower  triangular  with  LLT  =  F(x).  The 
gradient  of  ^  is  given  by 


gt(x)  =  -Tr  F(x)~1Ft 

(73) 

=  — Tr  L~TL-xFt 

(74) 

=  — Tr  L~XF%L~T , 

(75) 

and  the  Hessian  is  given  by 

Hij(x ) 

=  Tr  F{x)-xFtF{x)-xF3 

(76) 

=  Tr  L~t  L~x  FtL~T  L~x  F3 

(77) 

=  Tr  ( L~xFtL~T )  ( L-xF,L~t )  . 

(78) 

Similarly,  the  least-squares  characterization  of  the  Newton  step,  given  by  formula  (46),  be¬ 
comes 


-  H(x)  xg(x) 


argmin 
v  G  Rm 


I -'EviL-'FiL-7 

i  =  l 


(79) 


(this  follows  from  the  fact  that  there  is  an  orthogonal  matrix  Q  such  that  F(x)  x!2  =  QL  1  = 
L~TQT ,  so  that  F(x)-1l2FlF(x)-1l2  =  QL~X FtL~T QT). 

In  other  words,  the  congruence  F  :=  L~x F L~T  normalizes  the  problem  so  that  F(x)  =  F 
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6.3  Full  blocks 


Suppose  first  that  the  blocks  in  F  are  “full”  (or,  we  choose  to  ignore  any  structure  the 
individual  blocks  in  F  may  have).  Of  course,  T_1  will  have  the  same  block  structure  as  F . 
Let’s  give  a  rough  operation  count  for  computing  the  Newton  step  at  a  given  x.  We  will 
ignore  constant  factors  and  keep  only  dominant  terms. 

Forming  F(x)  given  x  costs  m^n2.  We  can  compute  T_1  by  Cholesky  factorization  of 
each  block  of  F(x)  and  then  inversion.  The  cost  is  )T)n3.  Normalizing  the  problem,  he., 
forming  L~T FiL-1 ,  costs  TO)T)n3.  This  cost  dominates  so  far. 

We  suppose  first  that  we  compute  the  Newton  direction  by  forming  g(x)  and  H(x)  and 
solving  H(x)v  =  —g(x).  Forming  g(x)  costs  mu,  and  forming  H(x)  (which  is  the  Gram 
matrix  of  the  normalized  Fi)  costs  m2J2n]-  Finding  v  then  costs  to3.  The  dominant  term 
is  thus  to2  )T)  n2  (since  by  (72),  to  <  )T)n2). 

Now  suppose  that  we  compute  the  Newton  direction  by  solving  the  least-squares  prob¬ 
lem  (79),  which  has  to  variables  and  (ignoring  constant  factors)  )T)  n2  “equations”.  Using 
for  example  QR  factorization,  the  cost  is  to2  )T)n2,  which  is  the  same  cost  as  forming  the 
gradient  and  Hessian  and  solving  for  the  Newton  direction.  (Computing  the  Newton  step 
via  QR  factorization  will  have  better  numerical  properties,  however,  since  we  don’t  “square 
up,”  he.,  form,  the  Hessian.) 

Therefore,  the  total  operation  count  for  one  step  of  the  Newton  method  is  of  order 
max{ra2  )T)  n2,  to  )T)  n3  }. 

6.4  Exploiting  internal  block  structure 

We  can  exploit  additional  structure  that  the  blocks  may  have  to  reduce  the  computation 
required  for  the  Newton  step.  As  an  example,  consider  a  single  block  that  arises  from  the 
quadratic  constraint  || Ax  —  b\\  <  1,  where  A  £  RNxra  and  is  full  rank.  We  may  assume  that 
N  <  TO. 

The  block  associated  with  this  constraint  is 

I  Ax  —  b 
( Ax  —  b)T  1 

(we  ignore  for  now  the  other  blocks  in  F(x)).  Suppose  that  we  use  the  method  described 
above  in  section  6.3,  he.,  normalize  and  then  solve  a  least-squares  problem.  If  we  treat  this 
block  as  full,  it  incurs  a  cost  of  max{TO2iV2,  rriN3}  =  m2N 2  for  one  Newton  step.  We  will 
see  that  by  exploiting  the  special  structure,  this  can  be  reduced  to  to2W,  or  even  to2,  along 
with  some  initial  precomputation.  For  a  quadratic  constraint  of  high  rank  (he.,  N  significant 
compared  to  to),  this  factor  of  N2  is  significant. 

The  barrier  term  for  the  constraint  ||  Ax  —  b\\  <  1  is 

logdet  iQuad^)-1  =  log ( 1  -  yTy) 


U|||;l(|  (  •/'  )  - 
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where  y  =  Ax  —  b.  Hence  from  (27)  and  (29)  (or  direct  calculation),  the  gradient  and  Hessian 
of  this  barrier  term  are 


2 

9{x)  =  T  ATy 

1  -  VAJ 

(80) 

H(x)  =  AtA  +  g(x)g(x)T 

i-y  y 

(81) 

Given  x ,  the  cost  of  computing  y  and  1/(1  —  yTy)  is  raiY;  forming  g(x)  via  (80)  costs  mN, 
and  forming  H(x)  via  (81)  costs  m2N .  Hence  by  computing  g(x)  and  H(x)  in  this  way,  the 
cost  incurred  by  the  constraint  ||  Ax  —  b\\  <  1  is  m2N  per  Newton  step. 

Moreover,  suppose  that  we  precompute  and  store  the  Gram  matrix  A1 A.  Then  the  cost 
of  forming  H(x)  via  (81)  drops  to  to2,  so  the  overall  cost  incurred  by  the  block  is  to2.  (Note 
that  the  cost  of  the  precomputation,  he.,  forming  ATA,  is  m2N,  but  this  cost  is  amortized 
over  all  of  the  Newton  steps  performed  throughout  the  whole  algorithm.) 

This  computational  savings  can  also  be  understood  in  the  context  of  the  method  described 
in  section  6.3.  It  is  possible  to  derive  a  simple  explicit  expression  for  the  inverse  Cholesky 
factor  Z_1 .  We  save  computation  by  simply  evaluating  this  expression  rather  than  performing 
a  Cholesky  factorization  and  inversion. 

More  generally,  by  exploiting  the  special  structure  of  the  blocks  that  arise  from  the 
constraints  described  in  section  2,  we  can  lower  the  computational  cost  per  Newton  step 
below  the  “full”  block  cost  described  in  section  6.3,  although  in  many  cases  the  savings  is 
not  as  large  as  in  this  quadratic  constraint  example. 

6.5  Line  search 

We  noted  in  section  3.4  that  Nesterov  and  Nemirovsky’s  analysis  holds  for  exact  line  search 
step  length  selection,  he., 

a ^  :  =  argmin  (f>  (^x^  —  aH{x^)~1  g{x^)^j  (82) 

a 

With  exact  line  search,  the  number  of  iterations  required  to  compute  the  analytic  center  is 
typically  smaller  than  with  the  Nesterov-Nemirovsky  step  length  (44),  but  of  course  each 
iteration  involves  the  extra  computation  required  to  determine  the  step  length  a^k\  In  many 
cases,  there  is  an  overall  advantage  in  using  exact  line  search. 

We  need  to  compute 


a*  =  argmax{det(/  +  aP)\I  +  aP  >  0} 


(83) 


where 

P  =  YJViF(x)-1/2FlF(x)-1/2 

i 
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and  v  =  —  H(x)~1g(x).  (Note  the  similarity  to  the  standard  problem  of  computing  eigenvalues — 
the  difference  is  that  here,  we  want  to  compute  a  zero  of  the  derivative  of  the  characteristic 
polynomial  instead  of  the  characteristic  polynomial  itself.) 

We  first  reduce  P  to  tridiagonal  or  even  diagonal  form,  which  costs  J2ni-  Now  the 
derivatives  g(ct)  and  H(a)}  and  hence  the  Newton  step,  can  be  computed  at  a  cost  of  n. 
(This  reduction  is  used  in  the  algorithm  described  in  [NN90a].)  Several  methods  can  be  used 
to  find  a*,  e.g.,  we  can  use  Newton’s  method  with  the  Nesterov-Nemirovsky  step  length,  or 
bisect  until  S  <  1/4  and  then  switch  to  Newton’s  method.  For  this  latter  method,  it  can  be 
shown  that  in  the  worst  case  we  perform  no  more  than  (a  constant  times)  log  n  bisections 
to  reach  the  region  of  quadratic  convergence,  he.,  S  <  1/4  (after  which  we  perform  at  most 
a  small  fixed  number  of  iterations).  Thus,  in  the  worst  case  the  cost  of  computing  a*  is  no 
more  than  nlogn,  once  we  have  reduced  the  pencil.  So  the  cost  of  exact  line  search  is  at 
most  max{n  log  n,  )T)  rP- },  which  in  many  cases  is  small  compared  to  the  cost  of  computing 
the  Newton  direction. 


7  An  example 

7.1  A  Lyapunov  function  search  problem 

We  consider  a  simple  example  of  determining  a  Lyapunov  function  that  optimizes  a  decay 
rate  estimate  for  a  linear  differential  inclusion.  More  detail  on  this  and  similar  problems  can 
be  found  in  [BGFB93]  or  [BY89]. 

We  consider  the  differential  equation 

2/CO  (84) 

where  Gi  £  RWxiV  (and  do  not  depend  on  t)  and  the  Opt)  satisfy  =  1,  Opt)  >  0,  but 

are  otherwise  arbitrary. 

Given  any  P  =  PT  >  0,  let  V(z)  =  zTPz.  For  y(t)  satisfying  (84),  we  have 


■jW))  =  £»((<)y(<)T  (c<>  +  PGi) »(«) 

(85) 

<  max  y(t)T  (Gj P  +  PG^j  y(t ) 

(86) 

<  max  Amax  (GjP  +  PGt,P )  V(y(t)). 

(87) 

This  proves  that 

V(y(t))  <  eatV(y( 0)) 

(88) 

where 

/  L  L 

a  =  max  Amax  (gJ P  +  PGi,  P^j  =  Amax  (  @(Crf  P  +  PGi ),  P 

Vz  =  l  i= 1 
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(and,  moreover,  a  is  the  smallest  number  for  which  we  can  guarantee  that  dV / dt  <  aV 
regardless  of  yit )  and  the  particular  6i(t)). 

We  can  interpret  —a/2  in  (88)  as  a  conservative  stability  degree  estimate  or  guaranteed 
decay  rate  (if  a  <  0)  of  the  differential  inclusion  (84),  and  V(z)  =  zTPz  as  a  Lyapunov 
function  that  proves  it.  Our  problem  is  to  determine  the  Lyapunov  function  that  gives  the 
best  guaranteed  decay  rate  estimate  for  the  system  (84): 

minimize  Amax  P  +  PGi),  0  p)  . 

p  Q  V/=l  i= 1  / 

Since  the  objective  is  homogeneous  in  P  of  degree  zero,  we  can  normalize  P  by,  e.g.,  Tr P  = 
N.  We  also  impose  the  constraint  that  P  >  bmmI  where  1  >  Lin  >  0,  which  essentially 
limits  the  condition  number  of  the  Lyapunov  functions  we  are  willing  to  consider,  but  in 
most  cases  is  irrelevant  if  Lin  is  small  enough  (see[BY89]).  This  results  in: 

minimize  Amax  ((§)(Gj  P  +  PGi),  0  P\  .  (89) 

Xr  P  =  N  V;=i  i= i  / 

P  -  LniiJ  >  0 

When  this  problem  is  put  in  the  form  (8),  by  eliminating  the  equality  constraint,  we  find 
that  the  number  of  variables  is  m  =  N(N  +  l)/2  —  1,  the  size  of  the  matrices  A B  —  A  is 
r  =  LN ,  and  the  size  of  the  constraint  matrix  C  is  s  =  N .  The  matrix  P  =  (A B  —  A)  ®  C 
has  size  n  =  r  +  s  =  (P  +  l)iV,  and  consists  of  L  +  1  blocks  each  of  size  N. 

As  initial  feasible  point  we  can  take 

P(0)  =  /,  (90) 

A^°)  =  Amax  ^0(Cff  +  G/)^  +  1.  (91) 

Since  the  set  {P|TrP  =  N,  P  >  0}  is  bounded,  it  is  clear  that  all  the  assumptions  of 
section  1.5  are  satisfied.  Moreover  we  can  take  6max  =  N,  which  can  be  seen  as  follows. 
Since  P  >  0  and  TrP  =  N,  we  have  Amax(P)  <  N,  so  Amax  <  N. 

7.2  An  instance  of  the  problem 

In  the  next  two  sections  we  give  some  numerical  results  for  an  instance  of  the  problem  (89). 
We  consider  a  physical  system  consisting  of  two  unit  masses,  which  are  connected  to  each 
other  by  a  spring.  In  addition,  one  of  the  masses  is  connected  to  a  wall  (infinite  mass)  by 
another  spring.  The  two  springs  can  instantly  change  stiffness  over  the  range  of  [1,2].  By 
loosening  and  stiffening  the  springs  appropriately  we  can  pump  energy  into  our  system;  our 
task  is  derive  the  best  upper  bound,  based  on  a  quadratic  Lyapunov  function,  on  the  rate 
at  which  this  can  be  done. 
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With  y i  and  y2  denoting  the  positions  of  the  masses  and  y3  and  j/4  denoting  their  veloc¬ 
ities,  the  differential  inclusion  describing  this  system  is 


dy_ 

dt 


(t) 


0 

0 

h(t)  ~  hit) 
h(t) 


0 

0 

hit) 

-hit) 


1  0  ' 
0  1 
0  0 
0  0 


y(t), 


l  <  hit)  <2,  *  =  1,2. 


Thus  we  have  N  =  4  and  L  =  4  (representing  the  four  extreme  matrices).  There  are  9 
variables,  and  the  matrix  F  =  (A B  —  A)  ®  C  is  20x20,  and  consists  of  five  4x4  blocks.  The 
Lyapunov  function  P  is  initialized  as  /,  and  we  take  Lin  =  0.01  (which  limits  the  condition 
number  of  P  below  400).  The  optimal  Lyapunov  function  turns  out  to  have  a  minimum 
eigenvalue  of  about  0.42,  so  the  constraint  P  >  bmmI  is  (quite)  inactive.  The  optimum  value 
is  Aopt  =  0.6056,  which  has  multiplicity  four.  (The  multiplicity  is  split  between  one  active 
eigenvalue  corresponding  to  the  case  of  both  springs  loose,  he.,  k\  =  h  =  1,  and  one  active 
eigenvalue  corresponding  to  the  case  of  both  springs  tight,  he.,  k\  =  h  =  2-) 


7.3  Some  numerical  results:  method  of  centers 

The  table  below  shows  the  progress  of  the  method  of  centers  with  the  parameter  9  set  at 
0.001,  he.,  the  next  A  is  set  99.9%  of  the  way  towards  the  current  objective  value,  from  the 
current  value  of  A.  The  first  and  second  columns  show  the  iteration  number  and  objective 
value.  The  third  column,  labeled  gap  1,  shows  the  simple  bound  on  the  difference  between 
the  current  value  of  the  objective  and  the  optimal  value  from  the  simple  formula  (50),  he., 
(A(fc)-Amax(A(%fc)),I?(  %fc))))n&max/&m in.  The  fourth  column,  labeled  gap  2,  shows  the  better 
bound  obtained  using  (68).  The  next  column,  labeled  NeNe,  shows  the  number  of  Newton 
steps  that  were  required  to  compute  the  analytic  center  (he.,  the  current  iterate)  using  the 
Nesterov-Nemirovsky  step  length.  The  last  column  shows  the  number  of  Newton  steps  that 
were  required  to  compute  the  analytic  center  using  exact  line  search  step  length.  (In  both 
cases,  the  stopping  criterion  for  the  analytic  center  computation  is  8  <  0.001,  which  can  be 
shown  to  imply  that  (f>ix)  —  <fiix*)  <  0.0012.) 


0  =  0.001 


iteration 

^max 

gap  1 

gap  2 

NeNe 

LS 

1 

2.6511e  +  00 

1.03e  +  05 

6.21e  +  01 

5 

3 

2 

1.4645e  +  00 

4.76e  +  04 

3.36e  +  01 

12 

6 

3 

8.5597e  -  01 

2.44e  +  04 

6.05e  -  01 

18 

8 

4 

6.6293e  -  01 

7.74e  +  03 

4.32e  -  01 

17 

8 

5 

6.6093e  -  01 

8.76e  +  01 

5.79e  -  02 

9 

5 

6 

6.6061e  -  01 

1.29e  +  01 

8.74e  -  03 

15 

7 

7 

6.6057e  -  01 

1.82e  +  00 

1.27e  -  03 

14 

7 

8 

6.6056e  -  01 

2.88e  -  01 

2.02e  -  04 

7 

4 
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Five  iterations  (30  LS  Newton  steps)  are  required  to  reduce  the  objective  value  to  within 
0.001  of  the  optimal  value  (but  of  course,  we  don’t  know  this  at  the  fifth  iteration).  Eight 
iterations  (48  Newton  steps)  are  required  to  reduce  the  better  gap  (gap  2)  below  0.001. 

To  see  the  effect  of  the  parameter  0,  we  now  consider  the  large  value  6  =  0.5.  Note  that 
the  table  does  not  show  every  iteration. 


0  =  0.5 


iteration 

^max 

gap  1 

gap  2 

NeNe 

LS 

1 

2.6511e  +  00 

1.03e  +  05 

6.21e  +  01 

5 

3 

5 

1.0344e  +  00 

3.18e  +  04 

1.41e  +  00 

3 

3 

10 

6.6751e  -  01 

2.68e  +  03 

1.03e  +  00 

2 

2 

15 

6.6117e  -  01 

1.44e  +  02 

9.36e  -  02 

2 

2 

20 

6.6060e  —  01 

8.88e  +  00 

6.09e  -  03 

2 

2 

24 

6.6056e  -  01 

9.61e  -  01 

07) 

I— 1 
Cb 

1 

o 

2 

2 

As  expected,  convergence  is  slower — 15  iterations  (37  Newton  steps)  are  required  to  converge 
to  within  0.001  of  the  optimal  value  and  24  iterations  (55  Newton  steps)  are  required  to 
reduce  the  better  gap  below  0.001.  Also  as  expected,  the  number  of  Newton  steps  required 
to  compute  each  analytic  center  is  smaller  than  in  the  case  0  =  0.001,  since  the  initial  points 
for  the  analytic  center  computations  are  “more  feasible”  than  in  the  case  0  =  0.001.  Note 
that  the  total  numbers  of  Newton  steps  required  (37  and  55,  respectively)  are  not  much 
larger  than  the  numbers  required  in  the  case  0  =  0.001  (30  and  48,  respectively). 

We  now  consider  the  value  0  =  le-6,  which  is  very  nearly  the  classical  method  of  centers. 
The  results  are  shown  below: 


0  =  le  -  6 


iteration 

^max 

gap  1 

gap  2 

NeNe 

LS 

1 

2.6511e  +  00 

1.03e  +  05 

6.21e  +  01 

5 

3 

2 

1.4726e  +  00 

4.72e  +  04 

3.35e  +  01 

12 

6 

3 

8.6616e  -  01 

2.42e  +  04 

6.21e  -  01 

12 

6 

4 

6.6201e  -  01 

8.16e  +  03 

2.70e  -  01 

11 

5 

5 

6.6078e  -  01 

4.92e  +  01 

3.32e  -  02 

25 

11 

6 

6.6059e  -  01 

7.38e  +  00 

5.03e  -03 

10 

5 

7 

6.6056e  -  01 

1.04e  +  00 

7.28e  -  04 

31 

11 

While  the  convergence  is  essentially  the  same  as  for  the  case  6  =  0.001,  the  number  of 
Newton  steps  required  per  iteration  is  larger,  since  the  initial  points  for  the  analytic  center 
computations  are  “less  feasible”  than  in  the  6  =  0.001  case. 
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7.4  Some  numerical  results:  ellipsoid  algorithm 

For  comparison  we  solve  the  same  problem  using  the  ellipsoid  algorithm,  which  is  a  general 
algorithm  that  can  minimize  a  quasiconvex  function  subject  to  a  convex  constraint.  We 
give  a  brief  but  complete  description  of  the  algorithm  here.  More  details  can  be  found  in, 
e.g.,  [BB91]. 

The  ellipsoid  algorithm  must  be  initialized  with  an  ellipsoid  that  contains  a  minimizer. 
As  initial  ellipsoid  we  take 

£(0)  =  |  P  || P  -  I\\F  <  ^N(N  -  1),  TrP  =  N  | 

which  (by  our  outer  ellipsoidal  bound)  contains  the  set  of  positive  definite  matrices  with 
trace  iV,  and  so  contains  the  entire  feasible  set  for  our  problem. 

At  each  iteration,  we  produce  an  ellipsoid  of  smaller  volume  that  is  still  guaranteed  to 
contain  a  minimizer,  as  follows.  First  we  find  a  cutting-plane  that  separates  the  center 
of  the  current  ellipsoid  from  the  set  of  minimizers,  so  the  minimizer  is  now  localized  to  the 
intersection  of  a  half-space  and  the  current  ellipsoid.  Then,  the  next  ellipsoid  is  the  minimum 
volume  ellipsoid  that  contains  this  intersection.  (There  are  simple  formulas  for  this  update.) 

The  cutting-plane  is  computed  as  follows.  If  the  current  iterate  P G)  (which  is  the  center 
of  the  ellipsoid  £G)'j  is  not  feasible,  he.,  does  not  satisfy  P G)  >  bmmP  we  compute  the 
minimum  eigenvalue  of  P G)  along  with  a  corresponding  eigenvector  v  with  ||u||  =  1.  The 
cutting-plane  is  then  given  by  vTPv  =  Lin,  which  describes  a  hyperplane  in  {P|TrP  =  N}. 
In  other  words,  the  minimizer  is  contained  in  the  half-space 

{  P  |  vTPv  >  6min,  TrP  =  N  }  , 

since  P’s  not  in  this  half-space  are  surely  infeasible.  (p(fc)  is  not  in  this  half-space,  so 
intersecting  £G)  with  this  half-space  “cuts  away”  more  than  half  of  £G) .  For  this  reason  this 
is  called  a  “deep-cut.”  ) 

If  PG)  is  feasible,  he.,  satisfies  P G)  >  bmmP  then  we  generate  a  cutting-plane  from  the 
objective,  as  follows,  we  compute 

A(fc)  =  Amax  (0(GfP(fc)  +  P(fc)G'O,0^(fc)) 

\i=l  i=l  / 

along  with  a  corresponding  generalized  eigenvector  v  with  ||u||  =  1.  Then  any  minimizer 
must  lie  in  the  half-space 

L 

P  vt®(\WP  -  GjP  -  PGi )  v  >  0,  TrP  =  N 
8  =  1 

(since  any  other  P  will  either  be  infeasible  or  have  an  objective  value  larger  than  A^^).  In 
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this  case  we  can  also  compute  a  lower  bound  on  the  optimal  objective  value: 


Aopt  >  min 

Tr P  =  N,  P  e  £{k) 

vT  ®f=l  Pv  >  &min 


nT©f=1  (GfP  +  PGi)v 
vT  ®f=i  Pv 


(92) 


The  table  below  shows  the  progress  of  the  ellipsoid  algorithm.  The  column  labeled  gap 
shows  the  difference  between  A ^  and  the  lower  bound  (92).  (The  iterates  shown  in  the  table 
are  all  feasible.) 


iteration 

^max 

gap 

1 

4.2361e+00 

6.48e+02 

10 

6.8871e-01 

2.67e+02 

100 

6.7125e-01 

2.92e+01 

200 

6.6213e-01 

3.54e+00 

300 

6.6075e-01 

1.20e+00 

400 

6.6059e-01 

2.20e-01 

500 

6.6057e-01 

1.30e-02 

600 

6.6056e-01 

4.51e-03 

674 

6.6056e-01 

9.27e-04 

The  ellipsoid  algorithm  requires  190  iterations  to  converge  within  0.001  of  Aopt  and  674 
iterations  to  reduce  the  gap  below  0.001. 

7.5  Comparison 

For  this  problem,  the  computation  cost  of  an  ellipsoid  algorithm  iteration  is  less  than  but  still 
roughly  comparable  to  the  cost  of  a  Newton  step  and  line  search  in  the  method  of  centers. 

The  table  below  summarizes  the  numbers  of  Newton/line  search  steps  for  the  method  of 
centers,  and  the  number  of  iterations  for  the  ellipsoid  algorithm,  required  for  convergence 
within  0.001  of  the  optimal  value  (X^  —  Aopt  <  0.001)  and  for  reduction  of  the  gap  below 
0.001. 


criterion 

e  =  le-3 

6  =5e-l 

ell.  alg. 

\(k)  _  yopt  <  _001 

30 

37 

190 

gap  <  .001 

48 

55 

674 

We  should  make  several  comments  concerning  this  comparison.  First,  we  were  able 
to  initialize  the  ellipsoid  method  with  an  efficient  ellipsoid  (indeed,  the  minimum  volume 
ellipsoid  that  contains  {P\P  >  0,  TrP  =  N}).  In  the  general  problem,  no  such  efficient 
ellipsoid  is  available.  Second,  the  efficiency  of  the  method  of  centers,  as  compared  to  the 
ellipsoid  method,  rapidly  increases  with  problem  size. 
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8  Conclusions 


The  method  of  centers  is  a  simple  interior  point  algorithm  that  appears  to  be  very  efficient 
when  compared  to  other  algorithms  for  minimizing  the  maximum  generalized  eigenvalue  of 
a  pair  of  matrices  that  depend  affinely  on  a  decision  variable. 

We  do  not,  however,  present  the  algorithm  as  described  in  section  4.3  as  the  “fastest” 
measured  either  by  typical  practical  performance  or  by  bounds  on  worst  case  performance. 
In  particular,  the  algorithm  can  be  made  to  run  faster  using  standard  techniques,  three  of 
which  we  mention  here: 

•  First-order  predictor. 

It  is  possible  to  cheaply  compute  dx*/d\  at  X^k\  This  can  be  used  to  initialize  the 
Newton  algorithm  for  computing  x^k+1\  This  reduces  the  number  of  Newton  steps  per 
iteration. 

•  Weighted  analytic  centers. 

Let  v  >  1  be  some  integer.  Then  we  apply  the  method  of  centers  to  the  problem 
with  data  A,  B}  and  C  where  A  =  ®vi=1A  and  B  =  Of  course,  working 

with  v  “copies”  of  the  inequality  XB  —  A  >  0  does  not  change  the  optimal  value 
or  set  of  minimizers  for  the  problem.  In  effect,  we  substitute  the  barrier  function 
v  log  det(AH  —  A)-1  +  log  det  C~x  for  log  det(AH  —  A)-1  +  log  det  C~x . 

This  results  in  a  larger  reduction  of  A  per  iteration  but  more  Newton  steps  required 
per  iteration.  In  practice,  this  can  lead  to  substantially  faster  convergence  of  A ^  to 
A°pt  (measurec[  in  total  Newton  steps).  However,  the  dual  bounds  are  often  worse  than 
for  v  =  1. 

For  v  large,  the  method  of  centers  will  approach  an  analog  of  Dikin’s  affine  scaling 
algorithm  [Dik67]. 

•  Switching  to  a  quadratically  convergent  local  method. 

We  note  the  possibility  of  combining  the  method  of  centers  with  a  quadratically  con¬ 
vergent  local  method.  The  method  of  centers  identifies  the  active  eigenvalues  and 
eigenvectors  (via  the  dual  matrices  U  and  V)  as  it  proceeds  (or  more  precisely,  it  iden¬ 
tifies  the  branches  of  the  eigenvalue  functions  that  are  active  at  the  optimal  point  xopt). 
We  presume  that  once  these  active  eigenvalues  are  identified,  an  optimum  point  can 
be  computed  more  rapidly  by  switching  to  a  quadratically  convergent  method  such  as 
Overton’s  (see  [Ove88];  the  extension  to  the  generalized  eigenvalue  case  is  considered 
in  [Hae91]). 

We  have  not  given  a  complete  complexity  analysis  (worst  case  operation  count)  of  the 
algorithm,  since  we  have  not  given  any  bound  on  the  number  of  Newton  steps  required  to 
reach  (in  some  appropriate  approximate  sense)  the  analytic  center.  To  do  this  would  require 
modifying  the  algorithm  to  use  some  appropriate  approximate  analytic  center  instead  of 
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the  analytic  center  (which  of  course  cannot  be  computed  in  a  finite  number  of  steps  for 
n  >  6)  and  in  addition  restricting  9  to  be  close  enough  to  one  to  get  a  suitable  bound  on  the 
number  of  Newton  iterations  required  to  compute  the  approximate  center.  We  remind  the 
reader  that  in  [NN91b],  Nesterov  and  Nemirovsky  describe  a  potential  reduction  algorithm 
for  generalized  eigenvalue  minimization  and  give  a  complete  worst  case  complexity  analysis. 

In  any  case,  the  material  of  sections  2,  3,  and  6  only  concern  the  notion  of  the  analytic 
center  of  an  affine  matrix  inequality,  and  is  independent  of  the  method  of  centers. 
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A  Derivative  of  log  det  F 

The  derivatives  given  in  section  3.1  are  readily  derived  once  the  reader  knows  that 

log  det  (To  +  tFt)  =  TrF0_1Fi 

dt  t= o 

(assuming  of  course  that  det  F0  ^  0).  This  is  shown  as  follows. 

log  det  (To  +  fid)  =  log  det  ( F0  (/  +  tF^F^ 

=  log  det  F0  +  log  det 
=  log  det  F0  +  log  (l  +  tTrF^Ft  +  o(tfj 

from  which  (93)  follows. 

B  Proof  of  ellipsoidal  bounds 

Suppose  that  x  satisfies  F(x)  =  F0  +  YlT=i  xiF  >  0.  We  assume  without  loss  of  generality 
that  x  =  0  and  F0  =  /  (the  latter  by  multiplying  the  original  matrices  on  the  left  and 
right  by  T0  ).  From  the  formulas  for  the  gradient  and  Hessian  of  the  barrier,  we  have 
&•(())  =  -Tr/\  and  HtJ{ 0)  =  Tr 


(93) 


(94) 

(95) 

(96) 
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We  first  establish  the  inner  ellipsoidal  bound.  Suppose  that  zTH(0)z  <  1.  Since 

m 

zT H(0)z  =  J2  FiFj  =  \\F(z)  -  I \\2F  >  \\F(z)  -  1 1|2 

i,j  =  1 

we  conclude  that  ||i^(z)  —  I\\  <  1,  and  hence  F(z)  >  0.  This  proves  the  inner  ellipsoidal 
bound  (41).  Note  that  we  did  not  use  here  the  fact  that  0  is  the  analytic  center  of  the  matrix 
inequality.  Therefore  the  inner  ellipsoidal  approximation  holds  for  any  feasible  point  x: 

F(x)  >  0  and  (z  —  x)TH(x)(z  —  x)  <  1  =>  F(z)  >  0 

(see  Nesterov  and  Nemirovsky  [NN93]  for  a  generalization  to  any  self-concordant  barrier). 

Now  we  prove  the  outer  ellipsoidal  bound,  assuming  that  0  is  the  analytic  center  of 
F(x)  >  0,  so  that  Trih  =  0.  Then  for  any  z  £  Rm,  we  have  Tr  F(z)  =  TrF(O)  =  n. 
Similarly, 

m  m 

Tr F{z  f  =  ZiZjTrFiFj  +  2  ^  ztTrFt  +  Tr/  (97) 

i,j= 1  i= 1 

=  zTH(0)z  +  n.  (98) 

Now  suppose  that  z  satishes  F(z)  >  0.  For  any  X  =  XT  £  with  X  >  0,  TrX2  < 

(TrX)2  (this  can  be  seen  by  diagonalizing  X).  Therefore  we  have 

ztH(0)z  +  n  =  Tr F(z)2  <  (Tr F(z))2  =  n2  (99) 


so 

F(z)  >  0  ztF[(0)z  <  n(n  —  1) 
which  is  the  outer  ellipsoidal  bound. 

The  same  type  of  argument  can  be  used  to  derive  an  outer  ellipsoidal  bound  centered  at 
any  point  x  with  S(x)  <  1  (again,  see  Nesterov  and  Nemirovsky  [NN93]  for  a  generalization 
to  any  self- concordant  barrier).  We  proceed  as  follows.  Suppose  now  that  x  =  0  is  not 
necessarily  the  analytic  center.  Then  (99)  becomes: 

ztH(0)z  -  2g(0)Tz  +  n  =  Tr F(z)2  <  (Tr F(z))2  =  (n  -  g(0)Tz)2. 


This  implies  that 

zT  (i7(0)  —  5,(0)(/(0)t^)  z  +  2 (n  —  l)g(§)T z  <  n(n  —  1), 
which  can  be  put  in  the  form 

(z  -  xc)T  (h( 0)  -  ^(0M0)T)  (z  -  xc)  <  \ 

where 


(100) 


1  -/(0)2 

The  inequality  (100)  dehnes  an  ellipsoid  if  and  only  if  H( 0)  —  </(0)(/(0)T  >  0,  which  is  the 
same  as  <5(0)2  =  g(0)T//(0)-1g(0)  <  1.  The  center  of  the  ellipsoid,  xc,  is  displaced  along  the 
Newton  direction  from  the  point  x  =  0. 


32 


C  A  generalized  resolvent  inequality 

Suppose  that  Y  >  0  and  A Y  —  X  >  0,  and  let  Amax  =  Amax(X,  Y).  We  will  show  that 

1 


Tr(AF  —  X)  > 


(A  —  Amax)||y||  ’ 
Find  u  /  0  such  that  Xv  =  AmaxTT.  Then  we  have 


(A Y  -X)~1Yv  = 


A  -  Ar 


so  that 


Therefore 


(A  Y-Xj-'Y 
(A  Y-X)-1 


> 


A  -  Ar 


> 


(A  —  Amax)||y||)  ’ 


Our  conclusion  follows  from 


Tr(AF  —  X)-1  >  (A  Y-X) 


D  Maximum  of  a  linear  fractional  form  on  an  ellipsoid 

In  section  (5.1)  we  derive  an  upper  bound  for  A ^  —  Aopt  that  is  given  by  the  maximum  of  a 
linear  fractional  form  over  an  ellipsoid.  We  use  a  similar  bound  for  the  ellipsoid  algorithm 
in  (92).  Here  we  show  how  this  can  be  computed. 

By  a  suitable  change  of  coordinates  we  may  assume  that  the  problem  is  to  determine 


8  = 


max 
xT x  <  1 


T 

a  x 

cT  x  +  1 


We  will  assume  that  a  ^  0  (if  a  =  0  then  obviously  8  =  0). 

If  ||c||  >  1,  then  8  =  oo,  since  the  linear  fractional  form  is  unbounded  above  near 
x  =  — c/||c||. 

Assume  now  that  ||c||  <  1.  Then, 


8  <  7  aT x /  (cT x  +  1)  <  7  for  all  x  with 

( a  —  7 c)T x  <  7  for  all  x  with  ||x|| 
|| a  —  7c||  <  7. 


Therefore  8  is  equal  to  the  larger  root  of  the  quadratic  equation 


\\x\\  <  1 

(101) 

<  1 

(102) 

a  —  7c  2  =  7 2. 

(103) 
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In  the  case  ||c||  =  1,  the  solution  depends  on  the  sign  of  aTc.  (Although  this  case  is 
irrelevant  in  any  numerical  computation,  we  include  it  below  for  completeness.)  In  summary, 
the  solution  is: 


6 


(y\J(aTc)2  +  aTa(l  —  cT c)  —  aTcj  j  (1  —  cTc 
<  aTa/(2aTc ) 


||c||  <  1, 

||c||  =  1,  a1 c  >  0, 
otherwise. 


(104) 


E  Solution  of  the  constrained  problem 

Here  we  describe  the  solution  of  the  following  problem:  maximize  a  linear  fractional  form  over 
an  ellipsoid,  subject  to  an  upper  bound  on  the  numerator  and  a  (positive)  lower  bound  on 
the  denominator.  As  in  the  previous  section  we  change  coordinates  so  the  ellipsoid  becomes 
the  unit  ball.  The  problem  we  must  solve  assumes  the  following  form:  determine 

6  =  max 

xTx  <  1 
a1 x  <  a 
cTx  +  1  >  f3 


T 

a  x 


cT  x  +  1 


(105) 


In  our  problem,  a  and  f3  are  positive,  and  0  is  feasible,  he.,  1  >  f3. 

We  can  always  reduce  the  problem  (105)  to  a  two  dimensional  problem  (which  is  not 
surprising  since  the  two  affine  functions  aT x  and  cT x  +  1  do  not  vary  along  directions  lying 
in  a  subspace  of  dimension  n  —  2).  Using  a  Lagrange  multiplier  or  direct  argument,  it  can 
be  shown  that  a  maximizer  of  (105)  always  lies  in  the  span  of  a  and  c.  (This  is  clear  in 
the  case  where  the  Lagrange  multiplier  corresponding  to  the  constraint  xT x  <  1  is  nonzero, 
in  which  case  there  is  exactly  one  maximizer.  When  this  Lagrange  multiplier  is  zero,  the 
problem  (105)  can  have  multiple  maximizers,  one  of  which,  however,  lies  in  the  span  of  a 
and  c.) 

We  proceed  under  the  assumption  that  a  and  c  are  linearly  independent.  (If  they  are  not, 
the  problem  reduces  to  a  trivial  one  dimensional  problem.)  We  define  the  new  optimization 
variable  w  £  R2  given  by 

x  =  |  a  c  |  G~x 


where  G  is  the  Gram  matrix 


G  = 


T  T 
a  a  a  c 


T  T 
c  a  c  c 


Therefore, 


a 


T 


x  = 


cT  X  +  1  = 
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w  1, 
w2, 


and  the  constraint  xTx  <  1  becomes  w  £  £  where 


Our  problem  (105)  becomes:  determine 

8  =  max 

w  G  £ 
uq  <  a 
w2  >  f3 

The  solution  to  this  problem  is  readily  obtained  by  solving  a  few  quadratic  equations,  and 
so  has  essentially  no  computational  cost  (e.g.,  when  compared  to  the  computational  cost 
of  reducing  the  original  problem  to  the  five  parameters  appearing  in  (106)).  Its  solution  is 
cumbersome  to  describe,  however. 

We  first  note  that  the  solution  must  lie  in  the  first  quadrant  (uq  >  0,  w2  >  0),  and  on 
the  boundary  of  the  feasible  set  £  fl  {rc|rci  <  a,  w2  >  f3}.  We  distinguish  several  cases: 

Case  I:  [a  f3]T  £  £. 

In  this  case  the  maximizer  is  the  point  w1  =  [ a  /3]T,  and  we  have  8  =  a/ [3.  Henceforth  we 
assume  that  [a  f3]T  ^  £. 

Case  II:  [a  f3]T  ^  £,  0  G  £. 

In  this  case  the  maximizer  lies  on  the  line  segment  {w\w2  =  f3}  D  £  (which  is  readily  found 
by  solving  a  quadratic  equation;  the  assumptions  ensure  that  the  line  segment  is  nonempty). 
The  maximizer  is 


w  i 
W2 


(106) 


w 


(P) 


(ft  —  1  )aTc  +  (cTc  —  (ft  —  l)2)  det  G  )  j  cTc 


Henceforth  we  assume  that  0  ^  he.,  a1  a  >  det  G. 

We  now  compute  w7  the  local  maximum  of  wi/w2  on  d£  which  satisfies  uq  >  0,  w2  ft  0. 
By  solving  a  quadratic  equation  we  find: 


V aTa  —  det  G 

1  +  (aTcftaTa  —  det  G  —  det  G^j  j  aTa 

We  distinguish  three  more  cases  depending  on  w. 

Case  III:  [a  f3]T  ^  £,  0  ^  £,  hq  <  a,  w2  >  f3. 

The  condition  is  simply  that  w  is  feasible.  In  this  case,  w  is  the  maximizer. 

Case  IV:  [a  f3]T  ^  £,  0  ^  £,  iiq  >  a,  w2  >  f3. 

The  condition  is  that  w  violates  violates  the  hrst  linear  constraint.  In  this  case  the  maximizer 
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lies  on  the  line  segment  {u;|u;i  =  a}  fl  £.  By  solving  a  quadratic  equation  we  find  the 

maximizer  as: 

a 

1  +  ( aaTc  —  \J ( aT a  —  a2)  det  Gj  j  aT a 

Case  V:  [a  f3]T  £,  0  ^  £,  hq  <  a,  w2  <  fl. 

If  w  violates  the  second  linear  constraint,  then  the  maximizer  lies  on  the  line  segment  {w\w2  = 

f3}  fl  £.  In  this  case  the  maximizer  is 
In  summary,  the  solution  is  given  by 

a/ f3  Case  I 

((/3  —  l)aTc  +  \J (cTc  —  (J3  —  l)2)  det  G  ^  j  ( [3cTc )  Case  II  or  V 
(aaTa)  j  ( aT a  +  aaTc  —  \J ( aTa  —  a2)  det  G  )  Case  IV 
(aT a)  j  (aT c  +  V aTa  —  det  G  )  Case  III 
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